How Can We Analyze Differentially-Private Synthetic Datasets?
نویسنده
چکیده
Synthetic datasets generated within the multiple imputation framework are now commonly used by statistical agencies to protect the confidentiality of their respondents. More recently, researchers have also proposed techniques to generate synthetic datasets which offer the formal guarantee of differential privacy. While combining rules were derived for the first type of synthetic datasets, little has been said on the analysis of differentially-private synthetic datasets generated with multiple imputations. In this paper, we show that we can not use the usual combining rules to analyze synthetic datasets which have been generated to achieve differential privacy. We consider specifically the case of generating synthetic count data with the beta-binomial synthetizer, and illustrate our discussion with simulation results. We also propose as a simple alternative a Bayesian model which models explicitly the mechanism for synthetic data generation.
منابع مشابه
Thesis Proposal: Creation and Analysis of Differentially-Private Synthetic Datasets
Statistical agencies are faced with two conflicting objectives: protecting the privacy of their respondents, and providing researchers and policy makers with useful data. There exists a large body of literature on Statistical Disclosure Limitation (SDL) techniques, describing and evaluating methods for statistical agencies to share collected information to users while satisfying their confident...
متن کاملDifferentially Private Synthesization of Multi-Dimensional Data using Copula Functions
Differential privacy has recently emerged in private statistical data release as one of the strongest privacy guarantees. Most of the existing techniques that generate differentially private histograms or synthetic data only work well for single dimensional or low-dimensional histograms. They become problematic for high dimensional and large domain data due to increased perturbation error and c...
متن کاملOn the Privacy Properties of Variants on the Sparse Vector Technique
The sparse vector technique is a powerful differentially private primitive that allows an analyst to check whether queries in a stream are greater or lesser than a threshold. This technique has a unique property – the algorithm works by adding noise with a finite variance to the queries and the threshold, and guarantees privacy that only degrades with (a) the maximum sensitivity of any one quer...
متن کاملFinal Document: Improving Utility of Differentially Private Confidence Intervals
A differentially private randomized algorithm, M , is one meeting the requirement that given two neighboring datasets d and d′, that is datasets that differ in no more than one row, and a set of outcomes S, the following condition that Pr[M(d) ∈ S] ≤ e Pr[M(d′) ∈ S] holds for some ≥ 0. Differentially private algorithms run on datasets can provide the guarantee that the information of any one co...
متن کاملPersonalized and Private Peer-to-Peer Machine Learning
The rise of connected personal devices together with privacy concerns call for machine learning algorithms capable of leveraging the data of a large number of agents to learn personalized models under strong privacy requirements. In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable converg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011